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Military  decision  making  relies  heavily  upon  the  intuitive 
judgments  and  educated  guesses  of  decision  makers  and  their 
advisors.  The  critical  role  of  intuitive  judgments  makes  it 
important  to  study  the  factors  that  limit  their  accuracy  and 
to  seek  ways  of  improving  these  judgments.  To  that  end,  the 
present  study  examines  one  of  the  more  potent  errors  of  judgment 
that  our  research  has  discovered  — the  base-rate  fallacy. 

Many  situations  present  the  decision  maker  with  two  kinds 
of  information:  background  or  base-rate  information  about  how 
things  usually  are  in  such  situations  and  indicator  or  diagnostic 
information  telling  how  things  appear  to  be  in  the  particular 
situation.  Unless  the  diagnostic  information  is  extremely  good, 
the  usual  state  (base-rate)  should  be  an  important  guide  to 
judging  how  they  are  at  the  moment.  A statistical  formula, 

Bayes  rule,  tells  exactly  how  these  two  kinds  of  information 
should  be  combined. 

Failure  to  consider  background  information  in  situations 
in  which  it  is  actually  very  relevant  is  called  the  base-rate 
fallacy.  Such  failure  appears  to  be  very  widespread  and  to 
affect  even  trained  statisticians  when  they  rely  on  intuition 
rather  than  calculation. 

This  paper  tests  the  generality  of  the  base-rate  fallacy 
and  examines  a number  of  explanations  for  it.  The  examination 
indicates  that  the  effect  is  not  an  artifact  of  how  responses  are 
elicited  nor  of  the  order  in  which  information  is  presented.  Nor 
is  it  due  to  simple  misreading  of  the  problem.  It  cannot  be 
attributed  to  inherent  inability  to  integrate  multiple  sources  of 
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uncertainty.  Base-rates  are  apparently  ignored  because  subjects 
feel  they  should  be  ignored,  in  essence,  base-rates  often  seem 
irrelevant  when  they  should  be  given  great  weight.  This  paper 
suggests  some  problem  characteristics  that  seem  to  affect 
the  perceived  relevance  of  base-rate  information  and  the  likelihood 
that  it  will  not  be  ignored.  One  hypothesis,  tested  and  confirmed 
in  this  study,  is  that  base-rates  will  be  used  if  they  can  be 
interpreted  as  relating  causally  to  the  target  judgment. 

In  sum,  this  study  indicates  the  conditions  most  likely 
to  produce  the  base-rate  fallacy.  The  knowledge  obtained  here, 
leading  towards  an  understanding  of  when  base  rates  are  and 
are  not  viewed  as  relevant,  has  direct  implications  for  training 
people  to  overcome  this  bias. 
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1. 


INTRODUCTION 


1 . 1 The  Base-Rate  Fallacy;  Examples  and  Implications 

Problem  1:  Two  cab  companies  operate  in  given  city, 
the  Blue  and  the  Green  (according  to  the  color  of  cab 
they  run) . 85%  of  the  cabs  in  the  city  are  Blue,  and 

the  remaining  15%  are  Green. 

A cab  was  involved  in  a hit-and-run  accident  at 
night.  A witness  later  identified  the  cab  as  a 
Green  cab. 

The  court  tested  the  witness'  ability  to  distinguish 
between  Blue  and  Green  cabs  under  nighttime  visibility 
conditions.  It  found  the  witness  able  to  identify  each 
color  correctly  about  80%  of  the  time,  but  confusing  it 
with  the  other  color  about  20%  of  the  time.  What  do 
you  think  are  the  chances  that  the  errant  cab  was  indeed 
Green,  as  the  witness  claimed? 

(following  Kahneman  & Tversky,  1972b) 

This  is  a paradigmatic  Bayesian  inference  problem.  It 
contains  two  kinds  of  information.  One  is  in  the  form  of 
background  data  on  the  color  distribution  of  cabs  in  the  city. 

We  shall  call  this  base-rate  information.  The  second,  rendered 
by  the  witness,  relates  specifically  to  the  cab  in  question,  and 
we  shall  call  this  indicant  or  diagnostic  information. 

♦ 

I The  proper,  normative  way  to  combine  the  inferential 

impacts  of  base-rate  evidence  and  diagnostic  evidence  is  given 

I by  Bayes'  rule.  In  odds  form,  this  rule  can  be  written  as  0 = Q*R, 

♦ 
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where  0 denotes  the  posterior  odds  in  favor  of  a particular 
inference;  Q denotes  the  prior  odds  in  favor  of  that  particular 
inference;  and  R denotes  the  likelihood  ratio  for  that  inference. 
In  the  cab  example  above,  we  are  interested  in  the  probability, 
after  the  witness'  testimony,  that  the  errant  cab  was  Green. 

Denote  Green  cabs  and  Blue  cabs  by  G and  B,  respectively,  and 
denote  the  testimony  that  the  cab  was  green  by  g.  Spelling 
out  Bayes'  Theorem  in  full,  we  obtain: 

n = P(G/q)  = P(g/G)  ^ P(G)  _ .8  ..  .15  _ 12 
P(B/g)  P(g/B)  P(B)  .2  .85  17 

12 

and  thus  P(G/g)  = = .41.  Note  that  the  prior  odds  are 

based  on  the  population  base  rates,  whereas  the  likelihood  ratio 
is  determined  by  the  indicator. 

If  a posterior  probability  of  41%  seems  counterintuitive 
to  you  and  your  initial  inclination  is  to  be  80%  sure  that  the 
witness'  testimony  of  Green  is  in  fact  reliable,  then  you  are 
exhibiting  the  base-rate  fallacy  — the  fallacy  of  allowing 
indicators  to  dominate  base  rates  in  your  probability  assessments. 
You  are,  however,  in  good  company.  The  base-rate  fallacy  has 
been  found  in  several  experimental  studies  (see  Section  II),  and 
it  manifests  itself  in  a multitude  of  real-world  situations. 

In  a 1955  paper,  Meehl  and  Rosen  warned  against  the 
insensitivity  of  both  the  designers  of  diagnostic  tests  and  their 
subsequent  users  to  base-rate  considerations,  and  their  proneness 
to  evaluate  tests  by  their  hit  rate  (i.e.,  diagnosticity ) alone, 
rather  than  by  the  more  appropriate  measure  of  efficiency,  which 
would  take  into  account  base  rates,  as  well  as  costs,  goals,  and 
other  relevant  considerations.  Clinicians  are  apparently  unaware 
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that  they  should  feel  less  confident  when  a test  returns  a rare 
verdict  (such  as  "suicidal")  than  when  it  returns  a more  common  one. 

Such  warnings  persist  to  our  day.  Lykken  (1975)  laments 
current  injudicious  use  of  polygraph  outputs  by  commercial 
companies,  while  demonstrating  that  even  a highly  accurate 
polygraph  reading  is  very  likely  to  yield  erroneous  diagnoses 
when,  say,  it  is  administered  to  a whole  population  of  employees, 
only  a fraction  of  whom  are  really  guilty  of  some  offense. 

Dershowitz  (1971)  , Stone  (1975)  and  McGargee  (1976)  point  out 
that  since  violence  is  a rare  form  of  behavior  in  the  population, 
base-rate  considerations  alone  make  it  more  likely  than  not  that 
an  individual  who  is  preventively  detained  because  he  is  judged 
to  be  potentially  dangerous  is  really  quire  harmless,  a purely 
statistical  argument  whose  significance  has  only  recently  gained 
appreciation  among  jurists. 

Base  rates  play  a problematic  role  in  yet  another  legal 
context,  namely  the  fact-finding  process.  Though  there  is  no 
definitive  ruling  on  the  status  of  base-rate  evidence,  courts 
are  typically  reluctant  to  allow  its  presentation  often  ruling 
it  inadmissible  on  grounds  of  irrelevancy  to  the  debated  issues. 
While  some  of  the  legal  objections  reflect  sound  reasoning, 
others  are  clearly  manife.stations  of  the  base-rate  fallacy.  (For 
a discussion  of  base  rates  in  the  courts,  see  Tribe,  1971.) 

The  counterpart  of  disregarding  the  probative  impact  of 
base  rates  lies  in  over judging  the  probative  impact  of  indicators. 

To  hark  to  a well-known  children's  riddle,  white  sheep  eat  more 
grass  than  black  sheep  simply  because  there  are  more  of  them. 

Color  is  really  no  indicator  of  appetite  --  the  phenomenon  is 
a base-rate  one,  as  is  the  fact  that  in  1957  in  Rhode  Island 
more  pedestrians  were  killed  when  crossing  an  intersection  with 
the  signal  than  against  it  (Huff,  1959).  An  entire  methodology 
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of  experimental  control  has  been  conceived  to  guard  against 
this  prevalent  side  effect  of  the  base-rate  fallacy. 

The  base-rate  fallacy  may  underlie  some  phenomena  noted 
in  the  domain  of  interpersonal  perception  as  well.  Nisbett 
and  Borgida  (1975)  have  used  this  notion  to  explain  the 
perplexingly  minimal  role  that  consensus  information  typically 
plays  in  people's  causal  attributions,  consensus  data  being, 
in  effect,  base-rate  data.  The  consequences  of  the  base-rate 
fallacy  to  interpersonal  perception  was  also  unwittingly 
demonstrated  by  Gage  (1955) . Gage  found  that  predicting  the 
questionnaire  behavior  of  strangers  drawn  from  a familiar 
population  deteriorated  following  an  opportunity  to  observe  these 
strangers  engaging  in  expressive  behavior.  If  we  suppose  (a) 
that  the  indicators  gleaned  from  these  observations  suppressed 
the  base-rate  information  which  was  previously  available  through 
the  familiarity  with  the  source  population  of  these  strangers; 
and  (b)  that  these  base-rate  considerations  were  more  diagnostic 
(i.e.,  more  extreme)  in  themselves  than  the  expressive  behavior 
was,  then  Gage's  results  are  readily  understood. 

1 . 2 Experimental  Studies  Of  The  Base-Rate  Fallacy 

Although  the  existence  of  the  base-rate  fallacy  has  been 
aclcnowledged  for  quite  some  while  (Meehl  & Rosen,  1955;  Huff,  1959; 
Good,  1968) , it  was  first  studied  in  a controlled  laboratory 
situation  by  Kahneman  and  Tversky  (1973)  . These  investigators 
presented  subjects  with  a series  of  short  personality  sketches 
of  people  randomly  drawn  from  a population  with  known  composition. 
On  the  basis  of  these  sketches,  subjects  were  to  predict  to  which 
of  the  population  subclasses  the  described  persons  were  most 
likely  to  belong.  Subjects  were  responsive  to  the  diagnosticity 
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of  the  descriptions,  but  they  were  almost  totally  oblivious  to 
the  fact  that  the  different  subclasses  of  the  population  were 
of  grossly  different  size.  Therefore,  subjects  were  as  confident 
when  predicting  membership  in  a small  subclass  (which 
correspondingly  envoys  a smaller  prior  probability)  as  in  a larger 
one;  Kahneman  and  Tversky  interpreted  their  results  as  showing 
that : 


...  people  predict  by  representativeness,  that  is, 
they  select  . . . outcomes  by  the  degree  to  which 
(they)  represent  the  essential  features  of  the 
evidence  . . . However,  because  there  are  factors 
(e.g.,  the  prior  probability  of  outcomes...)  which 
affect  the  likelihood  of  outcomes  but  not  their 
representativeness,  ...  intuitive  predictions 
violate  the  statistical  rules  or  predictions 
(pp.  237-238). 

This  interpretation  explains  how  subjects  derive 
judgments  of  diagnosticity  from  personality  sketches,  but 
not  why  these  are  not  combined  with  base-rate  information.  That 
indicators  tend  to  dominate  base  rates  even  when  no  judgments  of 
representativeness  are  involved  is  evident  from  consideration 
of  problem  1,  with  which  this  paper  opens.  An  essentially 
identical  problem  was  presented  to  a total  of  147  subjects  in 
the  course  of  three  studies  (Kahneman  & Tversky,  1972b;  Lyon  & 
Slovic,  1976;  Bar-Hillel,  Note  1) . The  median  and  modal 
assessments  given  by  these  subjects  were  80%,  compared  with 
the  correct  Bayesian  assessment  of  41%  as  computed  above  --  a 
clear  case  of  the  base-rate  fallacy. 

Another  interpretation  of  Kahneman  and  Tversky 's  results 
was  offered  by  Nisbett,  Borgida,  Crandall  and  Reed  (1975), 
who  suggested  th^t  base-rate  information  is  ignored  in  favor  of 
target-case  information,  since  the  former  is  "remote,  pallid 
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and  abstract",  whereas  the  latter  is  "vivid,  salient  and 
concrete",  (p.  24)  Problem  1 again  shows  the  phenomenon  to 
be  more  general  than  these  authors  may  have  realized. 

Recent  investigations  have  addressed  themselves  to 
the  stability  of  the  base-rate  phenomenon  (Lyon  & Slovic,  1976; 
Bar-Hillel,  Note  1).  A wide  range  of  variations  of  the  basic 
problem  was  presented  to  a total  of  about  350  subjects.  These 
have  included  (a)  changing  the  order  of  data  presentation  with 
the  indicator  data  preceding,  rather  than  following,  the  base- 
rate  information;  (b)  using  green  rather  than  blue  as  majority 
color;  (c)  having  subjects  assess  the  probability  that  the 
witness  erred,  rather  than  the  probability  of  correct 
identification;  (d)  having  the  witness  identify  the  errant 
cab  as  belonging  to  the  larger,  rather  than  the  smaller,  of 
the  two  companies;  (e)  varying  the  base  rate  (to  60%  and 
50%);  (f)  varying  the  witness'  credibility  (to  60%  and  50%  hits) 

and  (g)  stating  the  problem  in  a brief  verbal  description 
without  explicit  statistics  (e.g.,  "most  of  the  cabs  in  the 
city  are  Blue",  and  "the  witness  was  sometimes,  but  rarely, 
mistaken  in  his  identifications" ) (Kahneman  & Tversky,  Note  2). 

Through  all  these  variations,  the  median  and  modal 
responses  were  consistently  based  on  the  indicator  alone, 
demonstrating  the  robustness  of  the  base-rate  fallacy.  It 
seems  that  people  ignore  base  rates  in  these  problems  for  the 
simple  reason  that  they  consider  them  irrelevant.  In  fact, 

Lyon  and  Slovic  (1976)  presented  subjects  with  a forced-choice 
question  regarding  the  relevance  of  the  two  items  of  information 
Subjects  were  offered  reasoned  statements  in  favor  of  (a) 
only  base  rates  being  relevant;  (b)  only  the  indicator  being 
relevant,  and  (c)  both  being  relevant.  In  spite  of  the  fact 
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that  the  correct  argument  was  explicitly  formulated  in  (c) , 

50%  of  their  subjects  chose  (b) . In  another  study,  Hammerton 
(1973)  gave  his  subjects  a similar  kind  of  problem,  but  omitted 
the  base  rates  altogether.  His  subjects  showed  no  awareness 
that  a vital  ingredient  was  missing. 

The  present  study  views  the  subjective  judgment  of 
"relevancy"  as  a key  concept  for  understanding  the  base-rate 
fallacy.  It  includes  a series  of  problems;  some  were  designed 
to  rule  out  alternative  explanations  of  the  phenomenom;  others 
were  designed  to  confirm  the  account  put  forth  by  the  author. 
Briefly,  this  account  suggests  that  people  order  informational 
items  according  to  their  perceived  relevance  to  the  required 
judgment.  More  relevant  items  dominate  less  relevant  ones. 
Items  are  combined  only  if  they  are  perceived  as  equally 
relevant.  ( A full  presentation  can  be  found  in  Section  VII.) 
Where  it  has  been  demonstrated,  the  base-rate  fallacy  is  a 
direct  result  of  base  rates  having  been  (subjectively)  less 
relevant  than  the  indicators.  This  study  will  show  that  by 
manipulating  relevancy,  the  fallacious  tendency  to  ignore  base 
rates  can  be  controlled. 
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2. 


THE  STUDY 


2 . 1 Subjects  and  Method 

The  empirical  core  of  this  paper  is  a collection  of 
inference  problems,  like  Problem  1,  which  were  presented  to 
about  1500  subjects  in  the  course  of  the  study.  These  subjects 
were  predominantly  Hebrew  University  applicants  who  answered 
the  problems  in  the  context  of  their  university  entrance  exams, 
and  thus  presumably  were  highly  motivated  to  do  their  best. 
Subjects  usually  received  only  one  problem,  but  occasionally 
two  or  three.  When  subjects  received  more  than  one  problem, 
these  were  chosen  to  be  quite  different  from  each  other,  so  as 
to  minimize  interference.  The  total  number  of  responses 
analyzed  approaches  3000.  Hebrew  University  applicants  are 
all  high  school  graduates,  mostly  18-23  years  old,  and  of  both 
sexes.  The  remainder  of  our  subjects  were  undergraduate 
volunteers.  Subjects  were  not  instructed  to  work  quickly, 
but  questionnaires  were  retrieved  after  about  4 minutes  (per 
question) , and  those  who  had  not  answered  by  then  were  simply 
discarded.  Four  minutes  was  ample  time  for  an  overwhelming 
majority  of  the  subjects. 

In  all,  about  45  problems  were  employed,  only  seven  of 
which  will  be  presented  in  detail.  The  rest  will  be  only 
briefly  sketched. 

2.2  The  Cab  Problem 


Problem  1,  with  which  we  opened  this  paper,  serves  as 
a point  of  departure  for  much  of  the  discussion  of  the  base-rate 
phenomenon . 
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Figure  2-1  presents  the  distribution  of  estimates  that 
52^  subjects  gave  to  this  problem.  Thirty-six  percent  of 
these  subjects  based  their  estimate  on  the  witness'  credibility 
alone  (.80),  ignoring  the  base  rate  altogether.  Eighty  percent 
was  also  the  median  estimate.  Only  about  10%  of  the  subjects 
gave  estimates  that  even  roughly  approximated  the  normative 
Bayesian  estimate  of  41%. 

The  same  pattern  of  results  was  obtained  with  the  whole 
spectrum  of  variations  described  in  Section  II.  The  modal 
answer,  which  invariably  matched  the  witness'  diagnosticity, 
was  given  by  up  to  70%  of  the  subjects. 

There  is  a very  seductive  argument,  applicable  to 
Problem  1,  which  would  generate  the  base-rate  fallacy.  It 
proceeds  as  follows:  our  witness  has  identified  the  errant 
cab's  color;  his  color  identifications  are  accurate  80%  of 
the  time;  ergo,  this  particular  identification  has  an  80% 
chance  of  being  accurate. 

The  flaw  in  this  argument  is  subtle.  We  happen  to  know 

what  color  attribution  the  witness  made,  and  it  is  a minority 

color.  Although  the  witness  is  perceptually  unbiased  in  favor 

of  either  color,  the  ecology  is  a fact  reflected  in  his 

identifications.  By  the  formula  of  total  probability,  a 

randomly  selected  cab  in  that  city  has  a 71%  (.8  x .85  + .2  x .15) 

probability  of  being  perceived  as  Blue  by  our  witness,  versus  29% 

I for  Green.  Moreover,  a percept  of  Green  is  more  likely  to  be 

erroneously  produced  by  a Blue  cab  (.85  x .2  = .17)  than 

2 

appropriately  produced  by  a Green  one  (.8  x .15  = .12). 
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In  this  figure,  as  in  those  to  follow,  the  arrow 
i "dicates  the  correct  Bayesian  estimate;  Md  stands  for 
Median;  Mo  stands  for  Mode;  the  number  to  the  right  of 
the  tallest  line  states  the  frequency  of  the  modal 
response . 
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FIGURE  2-1.  DISTRIBUTION  OF  RESPONSES  TO  CABPROBLEM 
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To  return  to  Figure  1,  it  might  be  thought  that  though 
subjects  err  in  falling  for  the  above  reasoning,  that  error 
is  not  a manifestation  of  the  base-rate  fallacy  at  all.  Since 
the  argument  relies  heavily  on  the  presence  of  direct,  though 
fallible,  testimony,  we  could  observe  what  happens  when  an 
indicator  that  does  not  lend  itself  to  the  same  argument  is 
substituted  for  the  witness. 

2.3  The  Suicide  Problem 


Such  an  attempt  is  to  be  found  in  Problem  2. 

Problem  2:  A study  was  done  on  causes  of  suicide  among 
young  adults  (aged  25  to  35).  It  was  found  that  the 
percentage  of  suicides  is  three  times  larger  among 
single  people  than  among  married  people.  In  this  age 
group,  80%  are  married  and  20%  are  single.  Of  100 
cases  of  suicide  among  people  aged  25  to  35,  how  many 
would  you  estimate  were  single? 

Formally,  this  problem  presents  the  same  two  items  of 
information  as  Problem  1.  There  is  base-rate  information 
regarding  marital  status,  and  diagnostic  information  regarding 
suicide  rates.  The  diagnostic  information,  however,  rather 
than  applying  directly  to  a specific  target  case,  is  itself 
a population  property  with  a distribution  of  its  own,  and 
derives  its  diagnostic  powers  by  virtue  of  having  different 
base  rates  in  the  two  population  subclasses. 

The  distribution  of  estimates  that  37  subjects  gave  to 
Problem  2 is  shown  in  Figure  2-2.  Forty-three  percent  of  the 
subjects  gave  a response  based  on  the  likelihood  ratio  alone 
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(75%) , completely  ignoring  the  fact  that  more  young  adults  are 
married  than  are  single.  The  median  response  was  also  75%. 

A Bayesian  estimate  based  on  the  given  data  gives  the 
answer  as  43%  (0  = .2/. 8 x 3 = 3/4),  but  only  six  responses 
fell  between  30%  and  50%. 

To  test  for  robustness,  Problem  2 was  subjected  to  a 
host  of  variations,  including  (a)  not  mentioning  the  base 
rates  explicitly  within  the  problem  (presumably  all  our 
subjects  Icnew  that  a majority  of  adults  aged  25  to  35  are 
married) ; (b)  asking  subjects  to  supply,  along  with  their 

answers,  estimates  of  the  missing,  but  necessary,  base  rate  (the 
results  of  these  estimates  confirmed  the  assumption  in  [a]^); 

(c)  varying  the  base  rates  (with  the  values  50%,  10%,  and  5%); 

(d)  varying  the  likelihood  ratio  (3  and  9);  (e)  providing 

purported  "actual”  suicide  rates  (5%  and  15%  of  deaths)  rather 
than  just  the  likelihood  ratios;  (f)  inverting  the  indicator 
to  support  the  base-rate  implication;  (g)  asking  about  the 
chances  that  an  individual  suicide  was  single,  rather  than 
for  the  number  of  singles  among  100  suicides;  and  (h)  changing 
the  cover  story  to  deal  with  the  differential  dropout  rates 
among  male  and  female  students  in  the  Hebrew  University  Medical 
School.  The  base  rate  was  varied  in  (c)  by  partitioning  the 
population  into  males  vs.  females;  siblings  vs.  only  children; 
or  people  with  a history  of  depression  vs.  "normal"  people. 

The  likelihood  ratio  was  presented  as  9 in  the  depressives  vs. 
"normals"  case  (denoted  Problem  2'). 

The  14  problems  produced  by  these  variations  did  not 
form  a factorial  design,  as  different  problems  incorporated 
different  numbers  of  the  listed  variations.  In  all,  they  were 
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presented  to  some  600  subjects.  The  modal  response  was  75% 
throughout  (90%  in  Problem  2').  It  was  given  by  between  25% 
and  80%  of  the  respondents.  The  median  response  was  75% 
in  ten  of  the  problems,  70%  in  three,  and  80%  in  Problem  2'. 

Interestingly,  Problem  2 is  subject  to  a slight 
reformulation  which  normatively  makes  the  base  rates  irrelevant 
Just  read  "the  number  of  suicides  is  three  times  larger  among 
single  people  than  among  married  people"  for  "the  percentage  .. 
That  subjects  were  not  merely  careless  in  reading  the  problem 
is  shown  by  the  similarity  of  their  response  pattern  when  the 
suicide  percentages  were  stated  explicitly.  In  general, 
"carelessness"  explanations  of  the  base-rate  fallacy  should 
not  be  pushed  too  far  unless  the  same,  or  highly  similar, 
confusions  can  account  for  all  the  results.  Finding  an  ad  hoc 
reformulation  is  too  much  like  finding  a question  to  fit  the 
answer . 

2 . 4 Can  People  Integrate  Uncertainties? 

In  light  of  our  results  so  far,  one  might  doubt  that 
people  are  capable  of  combining  uncertainty  from  two  sources. 

To  test  this  possibility,  consider  the  following  problem. 

Problem  3:  Two  cab  companies  operate  in  a given 
city,  the  Blue  and  the  Green  (according  to  the  color 
of  the  cab  they  run) . 85%  of  the  cabs  in  the  city 

are  Blue,  and  15%  a*"'  Green. 

A cab  was  involved  in  a hit-and-run  accident 
at  night.  There  were  two  witnesses  to  the 
accident.  One  claimed  that  the  errant  cab  had 
been  Green,  and  the  other  claimed  that  it  had 
been  Blue . 
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The  court  tested  the  witnesses'  ability  to  distinguish 
between  Blue  and  Green  cabs  under  nighttime  visibility 
conditions.  It  found  the  first  witness  (Green)  able 
to  identify  the  correct  color  about  80%  of  the  time, 
confusing  it  with  the  other  color  20%  of  the  time;  the 
second  witness  (Blue)  identified  each  color  correctly 
70%  of  the  time,  and  erred  about  30%  of  the  time. 

What  do  you  thinlt  are  the  chances  that  the  errant  cab 
was  Green,  as  the  first  witness  claimed? 

Of  27  subjects  responding  to  Problem  3,  14  gave  an 
assessment  of  55%  (midway  between  the  assessments  implied  by 
each  witness  alone,  disregarding  base  rates),  and  all  but  one 
gave  assessments  between  50%  and  60%. 

In  Problem  3'  (not  reproduced  in  this  text)  both  witnesses 
identified  the  cab  as  Green.  Twenty-four  of  the  29  subjects 
answering  this  problem  gave  an  assessment  of  75%  — again,  midway 
between  the  two  witness-based  assessments.  While  still 
disregarding  the  base  rates,  our  subjects  appear  to  be  averaging 
the  probabilistic  implications  of  the  two  testimonies.  Although 
averaging  is  not  the  proper  way  to  calculate  the  joint  impact 
of  the  two  independent  testimonies  (which  is  to  reapply  Bayes' 
rule),  it  clearly  indicates  that  both  sources  are  considered. 

(For  an  extended  normative  discussion,  see  Tverslty  & Kahneman, 
in  press) . Two  symmetrical  sources  of  uncertainty  can  be  dealt 
with  simultaneously. 
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What  if  both  items  are  base  rates? 

Problem  4:  Consider  the  following  statistics  regarding 
students  of  the  School  of  Social  Sciences  at  the  Hebrew 
University.  80%  of  the  doctoral  students  in  this 
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school  are  male.  70%  of  the  students  in  the 
Department  of  Sociology  are  female. 

X is  a doctoral  student  in  the  Department  of 

Sociology  (within  the  School  of  Social  Sciences) . 

What  do  you  think  are  the  chances  that  X is  male? 

Figure  2-3  displays  the  distribution  of  responses  given 
by  117  subjects  to  this  problem.  The  pattern  of  results  here 
is  somewhat  different  than  that  of  Problem  3,  its  dual  counterpart, 
particularly  in  that  about  40%  of  the  respondents  based  their 
estimate  on  one  item  only.  Note,  however,  that  no  one  item 
dominated  the  other;  in  fact,  subjects  were  equally  divided 
between  them.  But  then.  Problem  3 itself  differs  from  Problem  2 
in  two  important  respects:  (a)  the  information  given  in 
Problem  4 is  insufficient  to  determine  a unique  correct 
response.  In  the  absence  of  data  or  assumptions  regarding  the 
joint  distribution  of  the  two  variables,  any  response  is 
permissible  --  including  30%  and  80%,  the  modal  responses;  this 
is  not  true  with  Problem  3;  (b)  the  two  base  rates  don't  appear 

intuitively  as  equivalent  as  the  two  witnesses  in  Problem  3. 
Apparently,  some  subjects  considered  field  of  studies  more 
important  than  degree  sought,  and  others  vice  versa.  In  short, 
the  results  here  are  less  clear  cut  than  in  Problem  3,  but  it 
is  encouraging  to  note  that  the  median  response  is  55%  (midway 
between  the  two  base  rates) , and  no  one  base  rate  enjoyed  a clear 
superiority. 

2 . 5 Why  are  Base  Rates  Ignored? 

The  problems  discussed  so  far  both  demonstrate  the 
generality  of  the  base-rate  fallacy,  and  exclude  some  possible 
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explanations  for  it.  We  have  seen  the  failure  of  earlier 
proposals,  Kahneman  and  Tversky's  representativeness,  and 
Nisbett  and  Borgida's  saliency.  We  have  seen  that  the  effect 
is  not  an  artifact  of  the  elicitation  method  (e.g.,  item  order). 

It  clearly  goes  beyond  simple  misreading  of  the  problem.  It 
cannot  be  attributed  to  inherent  inability  to  integrate  two 
sources  of  uncertainty.  Base  rates  are  apparently  ignored 
because  subjects  feel  they  should  be  ignored.  To  put  it  plainly, 
they  seem  irrelevant. 

It  is  important  to  note  that  base  rates  do  not  always 
seem  irrelevant.  In  fact,  when  they  are  the  only  information 
available,  they  are  clearly  utilized  (Kahneman  & Tversky,  1973; 
Lyon  & Slovic,  1976;  Bar-Hillel,  Note  1).  It  is  only  in  the 
presence  of  additional  information  that  base  rates  are  ignored. 

A possible  account  for  this  phenomenon  is  as  follows: 
people  order  information  by  its  perceived  degree  of  relevance 
to  the  problem  they  are  judging.  If  two  items  seem  equally 
relevant,  they  will  both  play  a role  in  determining  the  final 
estimate.  But  if  one  is  seen  as  more  relevant  than  the  other, 
the  former  will  dominate  the  latter  in  people's  judgments.  It 
needs  to  be  pointed  out  that  these  judgments  of  relevance  levels 
are  independent  of  quantitative  considerations,  i.e.,  an  item 
of  no  diagnostic  value  may  nevertheless  be  judged  more  relevant 
than  an  item  of  high  diagnosticity . Judged  diagnosticity  will 
affect  the  weights  assigned  to  different  items  only  within  levels. 
The  levels  themselves  are  crude,  almost  qualitative,  categories. 

This  paper  does  not  offer  a theory  of  (subjective) 
relevance.  Indeed,  our  subjects  never  made  direct  relevance 
judgments.  Rather,  it  suggests  some  item  characteristics  which 
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seem  intuitively  to  affect  perceived  relevance  and  the  reader 
is  encouraged  to  assess  the  plausibility  of  the  account  by 
his  own  intuition. 

One  suggestion  is  that  case-specific  information  is 
typically  judged  as  more  relevant  than  general  considerations. 

In  Kahneman  and  Tversky's  (1973)  experiments,  the  case-specific 
information  was  labeled  "individuating",  since  it  actually 
described  the  target  case.  In  Cab  Problem  1,  the  indicant 
information  is  case  specific  in  the  sense  that  the  witness  is 
testifying  as  to  the  color  of  the  very  cab  that  was  involved 
in  the  accident.  While  it  is  fallacious  to  ignore  base  rates 
in  these  problems,  specific  information  often  does  justifiably 
dominate  more  general  information.  For  example,  predicting 
life  expectancy  for  a random  newborn  imp’'oves  when  the  infant's 
sex  or  weight  is  known. 

The  suicide  problem  (2)  differs  in  that  it  offers  two 
items  of  information  which  are  both  population  statistics. 

Why  does  one  nevertheless  dominate  the  other?  This  brings  us 
to  a second  suggestion:  Data  which  are  interpreted  as  relating 
causally  to  the  target  event  are  judged  as  more  relevant  than 
data  which  are  seen  as  "mere  statistics" . The  fact  that  more 
young  adults  are  married  than  single  is  not  perceived  as 
causally  related  to  suicide,  but  the  difference  in  the  suicide 
rates  of  these  two  groups  readily  implies  a greater  propensity 
on  the  part  of  single  individuals  to  commit  suicide  than  on 
the  part  of  married  individuals.  Ajzen  (Note  3)  suggests  a 
similar  "causality  heuristic",  claiming  that  "people  rely  on 
information  perceived  to  have  a causal  relation  to  the  criterion, 
while  disregarding  valid  but  noncausal  information",  (p.  1) • 

He  demonstrated,  as  we  do  below,  that  base  rates  which  are  made 
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to  appear  causally  related  to  the  target  outcome  do,  in  fact, 
assume  a role  in  people's  predictions.  A similar  idea  is 
expressed  in  Tversky  and  Kahneman  (in  press) . 

The  idea  of  relevance  ranking  is  more  powerful  than  either 
specificity  or  causality,  since  it  can  account  for  the  fact 
that  the  same  information  (e.g.,  base  rate)  may  be  used  in  one 
context  but  ignored  in  another,  depending  on  the  informational 
"competition" . 

We  proceed  now  to  review  some  experiments  designed  to 
test  one  implication  or  another  of  this  relevance  ranking  account. 

2.6  The  Dream  Problem 


Although  imposing  a differential  base  rate  on  an  existing 
dichotomy  (as  in  Suicide  Problem  2)  is  a powerful  way  of  inducing 
a causal  interpretation  of  data,  other  ways  exist,  as  seen  in 
problem  5. 

Problem  5:  Studies  of  dreaming  have  shown  that  80%  of 
adults  of  both  sexes  report  that  they  dream,  if  only 
occasionally,  whereas  20%  claim  they  do  not  remember 
ever  dreaming.  Accordingly,  people  are  classified  by 
dream  investigators  as  "Dreamers"  or  "Nondreamers". 

In  close  to  70%  of  all  married  couples,  husband  and 
wife  share  the  same  classification,  i.e.,  both  are 
Dreamers  or  both  are  Nondreamers,  whereas  slightly 
more  than  30%  of  couples  are  made  up  of  one  Dreamer  and 
one  Nondreamer. 
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Mrs.  X is  a Nondreamer.  What  do  you  think  are 
the  chances  that  her  husband  is  also  a Nondreamer? 

In  this  problem,  two  base  rates  are  offered,  that  of 
dreaming  for  individuals,  and  that  of  matching  for  married 
couples.  The  target  case  is  a married  individual,  so  both 
base  rates  apply  to  him.  Ostensibly,  the  two  items  play 
analogous  roles.  Undoubtedly,  if  either  were  given  alone, 
it  would  have  determined  the  majority  of  responses.  In  fact, 
however,  there  is  a marked  assymetry  between  the  two  items, 
from  both  a formal  and  a psychological  point  of  view.  Formally 
speaking,  what  the  data  tell  us  is  that  mating  is  random.  We 
expect  64%  of  couples  (.80  x .80)  to  be  both  Dreamers,  and  4% 

(.20  X .20)  to  be  both  Nondreamers,  for  a total  of  68%  (i.e., 
"close  to  70%").  Either  base  rate  is  equivalent  to  random 
mating,  given  the  other  base  rate.  Thus  a spouse's  classification 
is  entirely  irrelevant  — assessments  should  be  based  on  the 
dreaming  base  rate  alone.  Psychologically  speaking,  the  data 
seem  to  tell  the  converse  story.  Never  mind  the  individual 
base  rate  for  dreaming  — when  people  marry  they  tend  to  find 
similarly  classified  mates.  For  a married  target  case,  therefore, 
the  base  rate  for  matching  among  couples  should  predominate. 

That  this  is  indeed  so  can  be  seen  in  Figure  2-4. 

Two  additional  versions  of  Problem  5 were  presented  to 
52  and  49  subjects,  respectively.  In  the  first  version,  the 
spouse's  classification  was  given  as  Dreamer.  In  the  second 
version,  item  order  was  reversed.  The  same  median  and  mode 
of  70%  were  obtained. 

As  further  evidence  that  subjects  interpret  the  70% 
proportion  of  matches  as  reflecting  a tendency  for  individuals 
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to  marry  alike,  subjects  were  given  yet  another  version  of 
Problem  5.  It  had  the  same  opening  paragraph,  but  then  went 
on  to  say: 


\ 


Problem  5':  ...with  respect  to  dreaming,  mating  is 
completely  random. 

Mrs.  X is  a Nondreamer.  What  do  you  think  are 
the  chances  that  her  husband  is  also  a Nondreamer? 

Some  other  formulations  were:  "the  classification  of 
husband  and  wife  was  found  to  be  independent"  and  "the  spouse's 
classification  was  found  to  have  no  predictive  validity".  The 
cover  story  was  also  changed,  to  couples  of  mother-daughter 
(rather  than  husband-wife) . A total  of  270  subjects  saw 
some  version  of  Problem  5'.  The  median  response  was  always 
50%.  The  modal  response  was  50%  in  five  questions,  and  20% 
in  two  others.  Naturally,  if  people  believe  that  50%  is  the 
expected  number  of  matched  pairs  under  conditions  of  random 
mating,  it  is  no  wonder  that  they  interpret  70%  matched  couples 
as  indicating  a tendency  to  marry  alike.  The  base-rate  fallacy 
is  not  limited  to  Bayesian  inferences. 

2 . 7 Assimilating  Base  Rates  and  Indicators 

One  implication  of  our  proposed  account  is  that  by 
making  base  rates  and  indicators  seem  equally  relevant  to  the 
target  case,  the  dominance  of  one  by  the  other  would  give 
way  to  some  form  of  joint  influence.  We  now  describe  some 
attempts  to  do  just  that. 
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Problem  6:  A large  water-pumping  facility  is  ^ 

operated  simultaneously  by  two  giant  motors.  The 
motors  are  virtually  identical  (in  terms  of  model, 
age,  etc.),  except  that  a long  history  of  brea)tdowns 
in  the  facility  has  shown  that  one  motor,  call  it  A, 
was  responsible  for  85%  of  the  breakdowns,  whereas 
the  other,  B,  caused  15%  of  the  breakdowns  only. 

To  mend  a motor,  it  must  be  idled  and  taken  apart, 
an  expensive  and  drawn  out  affair.  Therefore, 
several  tests  are  usually  done  to  get  some  prior 
notion  which  motor  to  tackle.  One  of  these  tests 
employs  a mechanical  device  which  operates,  roughly, 
by  pointing  at  the  motor  whose  magnetic  field  is 
weaker.  In  4 cases  out  of  5,  a faulty  motor  creates 
a weaker  field,  but  in  1 case  out  of  5 this  effect 
may  be  accidentally  caused. 

Suppose  a breakdown  has  just  occurred.  The 

4 

device  is  pointed  at  motor  B.  What  do  you 
think  are  the  chances  that  motor  B is 
responsible  for  this  breakdown? 

As  in  the  Cab  Problem  1 and  other  instances  of  imperfect 
diagnosis,  we  have  here  a device  that  singles  out  a specific 
motor  as  the  likely  cause  of  a mechanical  failure.  However, 
the  present  base  rate  is  readily  interpreted  as  an  individual 
attribute  of  the  two  motors,  implying  that  one  motor,  A is  in 
worse  shape  than  the  other.  Thus,  both  the  base  rate  and 
indicator  single  out  a specific  suspect. 

As  can  be  seen  in  Figure  2-5,  the  pattern  of  results 
given  by  39  subjects  to  this  question  is  similar  to  that 
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obtained  in  Problems  3 and  4.  There  is  no  prevailing  strategy 
and,  correspondingly,  no  assessment  favored  by  a large  proportion 
of  subjects,  producing  the  diversity  of  responses  characteristic 
of  problems  in  which  the  base-rate  fallacy  is  not  manifest. 
However,  over  60%  of  the  subjects  gave  assessments  interpretable 
as  weighted  averages  of  the  two  items  of  information  (i.e., 
they  lie  strictly  between  15%  and  80%,  the  assessments 
corresponding  to  the  individual  items) , and  the  median  of  the 
distribution  is  at  40%,  remarkably  close  to  the  correct  Bayesian 
posterior  of  41%. 

In  the  following  problem,  the  strategy  for  assimilating 
base  rates  and  indicators  was  reversed. 

Problem  7;  Two  cab  companies  operate  in  a given 
city,  the  Blue  and  the  Green  (according  to  the  color 
of  cab  they  run) . Eighty-five  percent  of  the  cabs  in 
the  city  are  Blue,  and  15%  are  Green.  A cab  was 
involved  in  a hit-and-run  accident  at  night,  in  which 
a pedestrian  was  run  down.  The  wounded  pedestrian 
later  testified  that  though  he  did  not  see  the  color 
of  the  cab,  due  to  the  bad  visibility  conditions  that 
night,  he  remembers  hearing  the  sound  of  an  intercom 
coming  through  the  cab  window.  The  police  investigation 
discovered  that  intercoms  are  installed  in  80%  of  the 
Green  cabs,  and  in  20%  of  the  Blue  cabs. 

What  do  you  think  are  the  chances  that  the  errant 
cab  was  Green? 

Figure  2-6  shows  the  distribution  of  35  subjects' 
responses  to  this  problem.  Here  an  attribute  was  chosen  which 
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though  nonuniformly  distributed  between  the  two  population 
subclasses,  is  hard  to  conceive  of  causally.  It  is  more 
naturally  thought  of  as  mere  statistical  coincidence,  much  in 
the  manner  in  which  base  rates  are  typically  construed.  The 
median  response  is  48%,  close  to  the  correct  41%.  We  again 
encounter  a somewhat  flat  distribution,  with  no  one  prevalent 
response. 

Thus,  either  by  increasing  the  relevance  of  base  rates 
to  indicator  level,  or  decreasing  the  relevance  of  indicators 
to  base-rate  level,  the  two  can  be  caused  to  combine. 

Problem  6 was  one  of  five  problems  run  in  this  study  in 
which  base  rates  were  applied  to  individual  cases.  They  all  used 
the  same  parameters  and  format  of  presentation,  and  differed  in 
cover  story  only.  Likewise,  Problem  7 is  one  of  two  problems 
employed.  The  following  table  summarizes  the  results  of  all 
seven  variants.  Problems  6 and  7 of  the  text  appear,  respectively, 
as  6A  and  7B  in  Table  2-1. 


2-21 


TABLE  2-1.  RESULTS  FOR  PROBLEMS  6 i 7,  AND  THEIR  VARIATIONS 


Problem  6 


Problem 

A 

B 

C 

No.  of  Ss 

39 

46 

28 

Median 

Assessments 

40 

60 

38 

Modal 

Assessments 
(No.  of  Ss) 

15(16) 

80(8) 

20(7) 

Problem  7 


D 

E 

A 

B 

Overal 1 

67 

39 

23 

220 

68 

75 

48 

42 

60 

80 (18) 

80(11) 

30(5) 

42,80(4) 

80(44) 
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3,  DISCUSSION 


3 . 1 Further  Directions  For  Research 

A series  of  seven  problems  are  presented  in  detail  in  this 
paper,  drawn  from  a larger  pool  of  45  problems.  The  problems 
are  presented  in  a sequence  that  reflects  the  historical 
development  of  the  study:  attempts  to  establish  the  robustness 
of  the  base  rate  fallacy,  followed  by  a search  for  a "pure" 
example  which  would  be  impervious  to  some  possible  accounts  of 
the  fallacy,  this  in  turn  leading  to  the  emergence  of  the  account 
this  paper  propounds,  and  culminating  in  some  examples,  tailored 
by  the  implications  of  this  account,  which  demonstrate  how  base 
rates  can  influence  subjective  probabilities. 

The  problems  studied  can  be  roughly  divided  into  two  groups. 
The  first  group  contains  problems  in  which  one  item  dominated 
another  (1,  4,  5).  These  problems  are  characterized  by  a relatively 
high  degree  of  consensus  among  subjects,  with  responses  converging 
on  the  indicator-implied  estimate.  The  problems  in  the  second 
group,  on  the  other  hand  (2,  3,  6 and  7),  yielded  flatter,  less 
elegant  distributions,  with  two  or  more  modes  (Problem  2 is  an 
exception) . They  are  more  aptly  described  as  having  no  apparent 
dominance  rather  than  as  problems  in  which  a well  defined 
integration  policy  emerged.  This  latter  group  represents  an 
exercise  in  designing  questions  which  would  induce  subjects  to 
interpret  particular  data  in  ways  that  make  them  appear  more  or 
less  relevant.  The  study  offers  neither  a systematic  theory  of 
judged  relevance,  nor  any  predictions  as  to  how  items  which  are 
equally  relevant,  but  not  necessarily  equally  diagnostic,  would 
be  combined.  These  gaps  indicate  directions  for  future  research, 
with  a more  systematic  set  of  problem  types. 
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Another  fascinating  research  avenue,  albeit  formal  rather 
than  empirical,  is  in  analyzing  the  normative  way  of  combining 
uncertainties.  Bayes'  theorem  provides  a model  for  integration 
in  some,  but  not  all,  conditions.  There  are  intriguing  problems 
surrounding  both  the  issue  of  specificity  and  the  issue  of 
causality.  For  example,  if  you  knew  the  base  rate  of  Blue  cabs 
in  the  quarter  of  town  in  which  the  accident  occurred,  it  seems 
legitimate  to  substitute  that  for  the  base  rate  of  Blue  cabs  the 
city  over.  But  if  the  more  specific  base  rate  is  only  an  estimate, 
the  overall  base  rate  cannot  be  discarded.  As  to  causality,  if 
your  statistics  show  that  the  presence  of  a diagnostic  cue  implies 
a certain  state  with  a certain  probability,  the  base  rate  for 
that  state  becomes  immaterial  when  the  cue  is  present.  But  if 
all  you  know  is  the  probability  with  which  the  state  implies  the 
cue,  the  base  rate  of  the  state  remains  crucial  even  in  the 
presence  of  the  cue. 

Furthermore,  there  is  a question  as  to  how  two  uncertainties 
should  be  combined  when  both  are  relevant.  If  each  of  two  items 
points  to  a certain  outcome  with  a certain  probability,  should 
their  combined  impact  lead  to  an  estimate  which  is  some  average 
of  the  two,  or  should  it  be  more  extreme  than  either  individual 
estimate?  How  is  this  affected  by  priors?  By  lack  of  conditional 
independence  of  the  two  estimators?  (See  a discussion  of  some 
of  these  issues  in  Tversky  & Kahneman,  1976,  in  press.) 

3 . 2 Other  Views  of  Information  Integration 

Two  major  schools  have  made  extensive  studies  of  information 
processing  in  Bayesian  inference  tasks:  the  Bayesian  approach 
(Slovic  & Lichtenstein,  1971)  and  integration  theory  (Anderson, 
1972) . This  study  is  at  variance  with  one  central  concept  of  each. 
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The  Bayesian  approach:  a major  finding  is  that  people 
are  conservative  probability  revisers,  i.e.,  when  asked  to  judge 
which  of  two  binomial  populations  is  more  likely  to  have  yielded 
a given  sample,  they  almost  invariably  give  estimates  which  are 
less  extreme  than  indicated  by  Bayes'  rule.  Nonetheless,  for 
a long  time  many  researchers  were  content  to  conclude  that 
"the  subjects'  revision  rule  is  essentially  Bayes'  theorem" 

(Beach,  1966,  p.  6;  see  also  Edwards,  1968;  Peterson  & Beach, 

1967;  Schum  & Martin,  1968).  Around  1972,  the  Bayesian  approacn 
came  under  attack  from  two  directions:  integration  theory,  and 
the  judgment-heuristics  approach  of  Kahneman  and  Tversky. 

Kahneman  and  Tversky  (1972a)  claimed  that  the  Bayesian  model 
failed  to  capture  the  most  essential  determinants  of  the  judgmental 
process  it  purported  to  describe,  and  that  subjects,  rather  than 
being  conservative  Bayesians,  were  in  fact  not  Bayesian  at  all. 

By  choosing  tasks  carefully,  Kahneman  and  Tversky  showed  that 
people's  estimates  in  Bayesian  tasks  need  not  even  be  monotonically 
related  to  the  true  Bayesian  estimates.  However,  since  they  used 
the  same  type  of  task  in  their  study,  numerically  speaking  they 
too  obtained  conservative  assessments  of  data  diagnosticity . In 
contrast,  studies  of  the  base-rate  fallacy  readily  yield  radical 
results,  i.e.,  probability  revisions  more  extreme  than  allowed 
by  Bayes'  rule.  In  fact,  by  controlling  the  diagnosticity  of  the 
indicator  (whether  explicitly,  as  in  the  cab  problem,  or  implicitly 
as  in  the  Kahneman  and  Tversky  1973  studies)  vis  a vis  the  base 
rate,  one  can  achieve  conservatism  or  radicalism  at  will.  Thus 
"conservatism"  not  only  isn't  a property  of  people's  probability 
revisions,  it  isn't  even  a property  of  their  judgments  of 
diagnosticity.  The  whole  finding  is  a fluke  of  the  paradigm  used 
by  the  Bayesian  approach.  Conservatism  is  a "non  effect" 

(Anderson,  1972) . 
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The  integration  theory  approach:  a basic  assumption  of 
information- integration  theory,  the  most  unified  and  comprehensive 
approach  embodying  the  "time  honored ...  conception  of  the  organism 
as  an  integrator  of  stimulus  information. .. in  judgment" 

(Anderson,  1972,  p.  3)  is  that  of  a series  of  stimuli  each  has 
relevance  for  a particular  judgmental  task,  then  the  combined 
effect  of  the  series  upon  the  response  can  be  described  by  a model 
which  assigns  each  valued  stimulus  an  appropriate  (subjective) 
weight,  roughly  corresponding  to  its  impact  upon  the  response. 
Typically,  an  additional  assumption  of  independence  is  introduced, 
according  to  which  the  weight  (though,  of  course,  not  necessarily 
the  .’'elative  weight)  of  a stimulus  is  independent  of  the  other 
stimuli  with  which  it  is  combined.  Integration  theory  has  been 
applied  to  a variety  of  judgmental  tasks,  including  the  Bayesian 
inference  tasks  discussed  above.  Shanteau  (1970,  1972)  compared 
the  integration  theory  and  the  Bayesian  approach  to  these  tasks, 
and  concluded  that  integration  theory  gives  a superior  account 
of  subjects'  behavior. 

In  one  study,  Shanteau  and  Anderson  (1972)  found  that  when 
judging  the  value  of  diagnostic  information  in  a task  in  which 
subjects  had  an  initial  probability  P of  winning  a sum  of  money, 
they  were  willing  to  pay  more  for  an  item  of  fixed  diagnosticity 
the  lower  P was,  thereby  indicating  a sensitivity  of  sorts  to 
prior  probabilities  even  under  conditions  of  constant  diagnosticity. 
This  result  seems  incompatible  with  the  base-rate  fallacy.  However, 
upon  closer  examination  of  Shanteau  and  Anderson's  tasks,  this 
appears  not  to  be  the  case.  While  their  subjects  were  willing 
to  pay  more  for  an  indicator  when  it  was  more  needed,  i.e., 
when  the  initial  probability  of  success  is  lower,  they  seemed 
unaware  that  they  were  in  some  cases  paying  for  a worthless 
commodity,  namely  for  information  which  should  have  in  no  way 
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affected  their  response.  In  other  words,  subjects  were  willing 
to  pay  more  for  more  diagnostic  information,  even  when  this 
additional  diagnosticity  when  combined  with  the  prior  should 
have  had  no  impact  on  their  guessing  strategy.  Had  subjects 
been  asked  to  evaluate  posterior  probabilities  in  this  situation, 
I suggest  that  their  posterior  estimates  would  have  manifested 
the  same  insensitivity  to  priors  typical  to  our  subjects.  Some 
support  for  this  position  is  given  indirectly  by  the  following 
data,  gathered  in  a kind  of  "thought  experiment"  modelled  after 
Edwards  (1968).^  Fifty-four  subjects  were  given  the  following 
problem: 

Imagine  ten  urns  full  of  red  and  blue  beads.  Eight  of 
these  urns  contain  a majority  of  blue  beads,  and  will 
be  referred  to  hereafter  as  the  Blue  urns.  The  other 
two  urns  contain  a majority  of  red  beads,  and  will  be 
referred  to  hereafter  as  the  Red  urns.  The  proportion 
of  the  majority  color  in  each  urn  is  75%.  Suppose 
someone  first  selects  an  urn  on  a random  basis,  and 
then  blindly  draws  four  beads  from  the  urn.  Three 
of  the  beads  turn  out  to  be  blue,  and  one  red. 

What  do  you  think  is  the  probability  that 
the  beads  were  drawn  from  a Blue  urn? 

In  other  versions  of  this  question,  the  number  of  Blue 
urns  was  given  as  five  out  of  the  total  ten,  and/or  the  number 
of  blue  beads  in  the  sample  was  given  as  one.  Results  are 
presented  in  Table  3-1. 

The  appearance  of  identical  modal  estimates  in  the  first 
two  rows  and  in  the  second  two  rows  reflects  insensitivity  to 
priors,  i.e.,  the  base-rate  fallacy.  The  complementation  of  the 


3-5 


modal  assessments  in  rows  1 and  3 and  in  rows  2 and  4 can  be 
seen  as  a replication  and  support  of  Kahneman  and  Tversky's 
claim  that  people  judge  diagnosticity  via  representativeness. 
Incidentaly,  note  that  in  row  3 we  have  a case  of  "radicalism" 
rather  than  conservatism.  Clearly,  no  integration  model  can 
account  for  these  results. 

One  striking  result  of  the  present  study,  which  would 
be  difficult  to  handle  within  integration  theory,  is  the  fact 
that  precisely  the  same  item  of  information  (e.g.,  the  base  rate 
of  Blue  cabs)  is  treated  differently  when  coupled  with  a more 
relevant  additional  item  (e.g.,  the  witness'  testimony)  and  when 
coupled  with  an  equally  relevant  item  (e.g.,  the  intercom 
distribution)  — in  spite  of  both  additional  items  being  formally 
equivalent.  In  other  words,  we  have  here  a very  strong  context 
effect,  wherein  the  weight  assigned  to  one  item  depends  very 
clearly  on  the  nature  of  the  item  with  which  it  is  coupled. 

Knowing  the  isolated  impact  of  two  individual  items  on  subjects' 
judgments  does  not  allow  us  to  predict  their  weights  in  combination. 
Once  the  weight  of  an  item  depends  not  only  on  algebraic 
considerations  but  on  the  way  its  relationship  to  the  criterion 
is  interpreted  (with  this  interpretation  being  open  to  external 
manipulation) , the  integration  theory  approach  here  receives 
a distinctly  ad  hoc  flavor. 

Psychologists  are  familiar  with  the  fact  that  as  information 
is  added  in  a probabilistic  inference  task,  confidence  increases 
rapidly,  whereas  accuracy  increases  only  minimally  (Oskamp,  1965) , 
if  at  all.  This  study  shows  that  new  information  may  actually 
lead  to  a decline  in  predictive  performance,  by  suppressing 
existing  information  of  greater  predictive  validity.  In  the 
mind  of  the  human  judge,  more  is  not  always  superior  to  less. 
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It  is  interesting  to  note  that  only  under  conditions  of  equal 
base  rates  does  the  claim  that  each  color  has  an  equal  chance 
of  being  identified  properly  entail  the  claim  that  each  color 
attribution  has  an  equal  chance  of  turning  out  to  be  correct. 
Whereas  the  former  tells  us  only  that  the  witness'  perceptions 
are  unbiased,  any  realization  of  the  latter  would  call  for 
a very  complex  system  of  response  biases  on  the  part  of  the 
witness,  varying  with  the  population  base  rate.  It  is  for  this 
reason,  of  course,  that  the  diagnosticity  of  indicators  is 
typically  stated  in  terms  of  their  Hit  and  correct-Re ject  rates, 
rather  than  in  terms  of  their  efficiency  as  Meehl  and  Rosen 
would  have  it.  It  is  the  former,  but  not  the  latter,  which, 
being  a constant  feature  of  the  indicator,  remains  invariant 
under  fluctuating  base  rates,  costs,  etc. 
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^According  to  the  Israel  Bureau  of  Statistics,  85%  of  the  25-35 
age  group  in  Israel  (where  this  study  was  run)  are  married. 
However,  since  subjects  estimate  this  proportion  as  80%  (median 
and  modal  response  of  32  Ss,  with  an  interquartile  range  of 
70%-80%) , we  used  a proportion  conforming  to  their  guess. 
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The  situation  described  in  this  question,  as  in  all  others,  is 
strictly  fictional.  However,  an  attempt  was  made  at  credibility 
throughout . 

^One  could  argue  that  the  affect  achieved  here  is  linking  the 
base  rates  causally  to  breakdowns,  rather  than  making  it  case 
specific.  Possibly  both  happen,  and  either  is  compatible  with 
the  hypothesis  studied. 

^The  idea  for  modelling  a thought  experiment  after  Edwards  was 
suggested  to  me  by  Lyon's  (1973,  Note  4)  unpublished  master's 
thesis.  I refer  to  it  as  a "thought  experiment"  since  in  the 
original  Edwards  study,  the  assessments  were  made  on  real  urns, 
beads,  and  samples. 
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